Optimizing large parameter sets in variational quantum Monte Carlo 
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We present a technique for optimizing hundreds of thousands of variational parameters in varia- 
tional quantum Monte Carlo. By introducing iterative Krylov subspace solvers and by multiplying 
by the Hamiltonian and overlap matrices as they are sampled, we remove the need to construct 
and store these matrices and thus bypass the most expensive steps of the stochastic reconfiguration 
and linear method optimization techniques. We demonstrate the effectiveness of this approach by 
using stochastic reconfiguration to optimize a correlator product state wavefunction with a pfaffian 
reference for four example systems. In two examples on the two dimensional Hubbard model, we 
study 16 and 64 site lattices, recovering energies accurate to 1% in the smaller lattice and predicting 
particle-hole phase separation in the larger. In two examples involving an ab initio Hamiltonian, we 
investigate the potential energy curve of a symmetrically dissociated 4x4 hydrogen lattice as well 
as the singlet-triplet gap in free base porphin. In the hydrogen system we recover 98% or more 
of the correlation energy at all geometries, while for porphin we compute the gap in a 24 orbital 
active space to within 0.02eV of the exact result. The numbers of variational parameters in these 
examples range from 4 x 10 3 to 5 x 10 5 , demonstrating an ability to go far beyond the reach of 
previous formulations of stochastic reconfiguration. 
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I. INTRODUCTION 

Quantum Monte Carlo (QMC) is a powerful technique 
for extracting predictions from the electronic Schrodinger 
equation [1 j. The variational (VMC) and diffusion 
(DMC) Monte Carlo methods in particular can produce 
highly accurate predictions provided that a sufficiently 
flexible trial wavefunction is available and that the varia- 
tional parameters of this wavefunction can be optimized. 
However, VMC and DMC suffer from the major limita- 
tion that the most effective stochastic optimization algo- 
rithms cannot handle more than a few thousand varia- 
tional parameters. These algorithms, which include the 
Newton [2J, approximate Newton [3 j, linear (LM) [4}{7] 
and stochastic reconfiguration (SR) [8] methods, are cur- 
rently constrained by their need to build and store ma- 
trices that become unmanageable when the number of 
variational parameters becomes large. Other stochastic 
optimization algorithms [9] that rely only on stochastic 
estimates for the energy gradient can treat more vari- 
ational parameters, but their steepest-descent character 
makes for less efficient convergence to the energy mini- 
mum, especially compared to the LM. In order to make 
effective use of sophisticated trial wavefunctions such as 
tensor networks, which can contain millions of variational 
parameters, it is imperative that more capable optimiza- 
tion methods be developed. 

The LM and SR optimization methods reduce to solv- 
ing either a system of linear equations or a linear eigen- 
value problem in which the matrices in question are de- 
termined by stochastic sampling. The essential difficulty 
in this approach is that the dimension of these matrices 



is equal to the number of variational parameters, pre- 
venting their construction when there are more than a 
few thousand variables. Here we propose solving the op- 
timization methods' central linear problems using iter- 
ative Krylov subspace algorithms, which do not require 
the matrices to be built explicitly. Instead, these solvers 
require that one evaluate matrix-vector products, which 
we will show to be far less difficult than actually build- 
ing the relevant matrices. In VMC this approach is made 
particularly efficient by the strategy of operating by the 
matrices during the sampling process, as each sampled 
configuration contributes an outer product to the over- 
all matrix, and outer products are particularly easy to 
operate by. 

In this paper we will demonstrate this approach by 
using the conjugate gradient (CG) iterative solver to im- 
prove the SR method. We also derive a method for im- 
proving the LM using the generalized Davidson solver, 
although we will only present numerical results for SR 
(a computer implementation for the LM is underway). 
We will begin by developing the theory for the acceler- 
ated SR and LM and also for the particular wavefunction 
ansatz that we employ. After developing the theory, we 
will present numeric results for the SR method in four 
examples: (a) the Hubbard model on a 4x4 lattice, (b) 
phase separation behavior in the 8x8 Hubbard model, 
(c) the potential energy curve of a symmetrically dissoci- 
ated 4x4 hydrogen lattice, and (d) the singlet-triplet gap 
of free base porphin. Note that the numerical studies 
carried out here are primarily concerned with the opti- 
mization problem. A detailed examination of the physics 
of these examples will be carried out elsewhere. 
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II. THEORY 

A. Accelerated Stochastic Reconfiguration 

The SR method can be viewed as an approximate 
imaginary time evolution in a specially chosen sub- 
space Q of the full Hilbert space. For a wavefunc- 
tion |\P(ai, Q2, . . .)) with variational parameters a, this 
subspace is spanned by the wavefunction and its a- 
derivatives, 

tt = span( |*°), I* 1 ), |* 2 >, ... ), (1) 

where = and |^) = d\V)/doti for i > 0. The 
strategy of the SR method is to minimize the wavefunc- 
tion's energy by repeatedly operating by T = 1 — tH 
(the imaginary time evolution operator e~ rH expanded 
to first order), where r is a small number and H is the 
Hamiltonian. After each application of T, the result is 
projected into Vt to produce a new wavefunction of the 
form \^') = ^2 i Xi\^/ 1 }^ in which the coefficients x are 
given by 

(^|(l-rF)|*) = ^(* i |^)x i . (2) 

3 

Finally, because r is small, the new wavefunction \^') can 
be closely approximated by ^(a^a^? ...))> where o! i = 
<%i +Xi/xo. To summarize, one solves the linear equation 
given in Eq. ([2| and updates a accordingly, after which 
the subspace ft is redefined for the new wavefunction. 
This entire procedure is repeated until the energy of the 
wavefunction has converged. 

Previously, the SR overlap matrix Sij = was 
constructed explicitly. Here we will avoid building S en- 
tirely, relying instead on the CG algorithm to solve Eq. 

This method proceeds iteratively, using information 
gained from a series of matrix-vector multiplications to 
successively refine an approximation to the solution x in 
a space of ort honor mal conjugate vectors. The iteration 
proceeds until an arbitrary accuracy is achieved and typ- 
ically converges in a number of steps far smaller than the 
dimension of the matrix. To see the advantages of us- 
ing CG, consider the following expressions showing how 
the overlap matrix was previously constructed through 
stochastic sampling, 

|*> = (4) 

n 

\¥)=J2K\n). (5) 

n 

Here a resolution of the identity ^2 n \n)(n\ has been in- 
serted, creating a summation over all possible system 
configurations \n). By multiplying and dividing by \^ n \ 2 
the summation has been formulated so that it can be 
evaluated stochastically by sampling from the distribu- 
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FIG. 1: Relative energy errors for the CPS-Pfafflan ansatz on 
a periodic 4x4 Hubbard lattice. Statistical errors are smaller 
than the symbol size and lines are guides to the eye. 



tion l^l 2 /^!^). However, building S stochastically us- 
ing Eq. (I3J) takes at least 0(n s n 2 ) time, where n s is the 
number of samples and n v is the number of variational 
parameters. Using the CG algorithm we may avoid this 
cost by instead evaluating matrix-vector products of the 
form Sz. As with the expression for constructing S, this 
expression can be evaluated by stochastic sampling if we 
insert a resolution of the identity, 

By interchanging the order of summations we can rewrite 
this product as 

which can be evaluated in 0(n s n v ) time provided that 
the derivative ratios ^ l n /^ n have been pre-computed and 
stored, which is not difficult as the storage can be trivially 
divided between the different processors. While the CG 
algorithm does require multiple matrix-vector products 
to be evaluated, the number of such products will be 
much smaller than n Vl greatly improving the efficiency 
of the SR method. 



B. Accelerated Linear Method 

The linear method (LM), formulated by Nightingale 
for linear parameters 0] and later extended to optimize 
nonlinear parameters [5]-[7], works in the same subspace 
ft as the SR method but typically converges more rapidly 
to the energy minimum. It can be viewed as an approx- 
imate Newton method with a built in stabilization [7] 
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and often converges even more rapidly than the Newton 
method. Instead of using imaginary time evolution, the 
LM optimizes by finding the eigenstate of lowest en- 
ergy in the subspace. This eigenstate can be found by 
solving the following generalized eigenvalue problem, 



(8) 



where we now take x to be the coefficients of the desired 
eigenvector. Once these coefficients are found, the vari- 
ables ol can be updated to their new values in the same 
manner as in SR, though care must be taken to check that 
for large parameter changes the resulting parameters give 
an energy that is not higher outside of statistical errors. 
If they do not, the step can be scaled down by using a line 
search, or, rotated and scaled down by adding a diagonal 
shift. In practice, it is essential to modify the update in 
order to make it orthogonal to the original wavefunction, 
a procedure that can be completed using the information 
resulting from a single matrix- vector multiply involving 
the overlap matrix. 

As with SR, the eigenvector x can be found without ex- 
plicitly building the matrices H and S by using a Krylov 
subspace method, in this case the generalized Davidson 
algorithm [10]. As with CG, it is sufficient to evaluate 
the matrix-vector products of H and S with arbitrary 
trial vectors. For S, this product can be performed effi- 
ciently as explained above. For H, the difficulty of the 
multiplication depends on the complexity of the system's 
Hamiltonian, but for the relatively general case of the 
non-relativistic Born-Oppenheimer Hamiltonian an effi- 
cient evaluation is possible. If we assume a fixed particle 
number, we may use a matrix factorization such as the 
Cholesky decomposition [11] to express this Hamiltonian 
as 



H = ^2Y1 L$ q R£ a ala q ala a , 

/i pqrs 



(9) 



where the operator aj, (a p ) is the fermionic creation (de- 
struction) operator for the pth. spin orbital, the index fi 
has a range of 0(n 2 ) (n is the number of orbitals), and 
the indices g, r, s each have range n Q . In practice, the 
range of \i can often be taken to be much smaller than 
n 2 while still representing H with sufficient accuracy. By 
inserting an identity operator in the center of the Hamil- 
tonian, the matrix-vector product on the left hand side 
of Eq. ([8| can be written as 



(10) 



z_ my) 1 '!* 11 ™ y n y n x i 

jn^pqrs 

= Qnwi ^rs Q™ r *j X j ' 

n * ' ' pq fi rs j 

where we have defined the intermediate tensor Q nr sj — 
(n\ala s \tyi) /^ n . For the wavefunction presented in the 
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FIG. 2: The function eh(h) on an 8x8 Hubbard lattice with 
twist- averaged boundary conditions and U/t = 4. The pres- 
ence of a minimum implies that our ansatz predicts phase 
separation in the 2D Hubbard model. 



next section, this intermediate can be evaluated in 0(n^) 
time for a given configuration \n). If we sample the con- 
figurations |n) from the distribution |^ n | 2 /(^|^), we see 
that the entire matrix-vector product can be evaluated 
in 0(n s n^) time by performing the summations in the 
last line of Eq. (10) from right to left. We therefore see 



that like SR, the LM can be performed without explicitly 
constructing the matrices involved. 



C. Wavefunction Ansatz 

For our variational ansatz, we use a product of a cor- 
relator product state (CPS) tensor network [12] [13] and 
a pfaffian pairing wavefunction [T4HT6] . As discussed in 
Ref. [17 , the CPS ansatz can be expressed as a product 
of correlators acting on a reference wavefunction. Here 
we take the same approach, but with a pfaffian as the 
reference rather than a Slater determinant. The wave- 
function is written as 



N/2 



ij a i a ] 



|o>, 



(ii) 



where the operators C p are correlators, / is the pairing 
matrix, N is the number of electrons, and |0) is the vac- 
uum. The indices z, j range over all spin orbitals, so our 
pairing function creates both singlet and triplet pairs, un- 
like the more restrictive antisymmetrized geminal power 
[18] [19] . Two typical types of correlators are long range 
pairs and nxn square plaquettes. In each case, both the 
spin t and 4- versions of the spatial orbitals are included 
in a correlator, so the number of contained spin orbitals 
(variational parameters) is 4 (2 4 ) for a pair correlator and 
2n 2 (2 2n ) for a plaquette. 
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III. RESULTS 

Here we demonstrate the accelerated SR method by 
applying it to four example systems in conjunction with 
our CPS-pfaffian ansatz. In each system, we restrict our 
sampling to configurations with the correct total number 
of electrons and the correct total S z . The accelerated 
LM has yet to be implemented on a computer and thus 
will be tested in future work. 



A. 4x4 Hubbard Model 

In our first example we studied a 4x4 Hubbard lattice 
at half filling with periodic boundary conditions, which 
was chosen as it is an exactly soluble system that con- 
tains many of the challenging features of the general 2D 
Hubbard model. Two translationally invariant 3x3 corre- 
lators were used, one anchored on each sublattice, giving 
a wavefunction with a total of 524,784 variational param- 
eters. In Figure [I] we show the error relative to the exact 
result, which for all ratios U/t is less than 1%. 



B. 8x8 Hubbard Model 

We have also applied our method to test for phase 
separation in the 2D Hubbard model, the exact nature 
of which remains an interesting and unresolved problem 
in solid state physics. To do so we studied an 8x8 lattice 
with twist- averaged boundary conditions (TABC) [20]- 
[22] (we used 12 randomly chosen twists) and U/t = 4. 
We used translationally invariant 2x2 and long range pair 
correlators, again using separate correlators for each sub- 
lattice. To check whether the system phase separates, we 
computed the quantity e^(/i) employed in Ref. [22], which 



will display a minimum at the critical hole density h c if 
phase separation occurs. As seen in Figure [2| our ap- 
proach predicts that the system will phase separate with 
a critical hole density 0.14 < h c < 0.15. This result pro- 
vides a qualitative corroboration of the Constrained-Path 
Auxiliary Field QMC [22; results of Zhang et al, who pre- 
dicted phase separation with h c = 0.1 for the 8x8 lattice 
with TABC and U/t = 4. 



C. 4x4 Hydrogen Lattice 

As an example of a strongly correlated problem in- 
volving an ab initio Hamiltonian, we have studied a 4x4 
square lattice of hydrogen atoms in the STO-3G orbital 
basis [23 at various nearest-neighbor distances. As this 
system has open boundary conditions, we did not use 
translationally invariant correlators. Instead, we used all 
2x2 and long range pair correlators, which results in a 
wavefunction with 4,048 variational parameters. As seen 
in Figure [3j the results closely match those of the exact 
wavefunction. Even at the H-H distance with the worst 
error, our approach captures 98% of the correlation en- 
ergy, which we define as the energy difference between 
the restricted Hartree Fock and exact wavefunctions. 



D. Free Base Porphin 

As our final example, we computed the singlet-triplet 
gap of free base porphin in the 6-31G orbital basis [24] . 
This system was chosen as an important quantum chem- 
ical problem for which exact results in the active space 
are available for comparison. For both the singlet and 
triplet wavefunctions, the Is and a bonding orbitals re- 
sulting from a restricted Hartree Fock calculation were 
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FIG. 3: Total energies of a 4x4 hydrogen lattice. Statistical 
errors are smaller than the symbol size and lines are guides 
to the eye. 



FIG. 4: In addition to all long range pairs, we use the corre- 
lators shown here when treating free base porphin. 
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treated as a closed shell determinant, while the 24 out- 
of-plane 2p orbitals from the RHF solution were local- 
ized by the Pipek-Mezey [25 scheme to form an active 
space containing the remaining 26 electrons. This active 
space was treated with our CPS-pfafflan ansatz, with the 
correlators taken to be all pairs as well as those shown 
in Figure [4j for a total of 9,064 variational parameters. 
Holding the core orbitals frozen, we computed an active 
space singlet-triplet gap of 1.77eV, which compares very 
favorably with the converged spin-adapted density ma- 
trix renormalization group [26] result of 1.75eV. 

IV. CONCLUSIONS 

We have shown that by using the conjugate gradient 
iterative solver, it is possible to optimize hundreds of 
thousands of variational parameters with the stochastic 
reconfiguration algorithm in the context of variational 
Monte Carlo. In addition, we have shown how the gen- 



ii] W. M. C. Foulkes, L. Mitas, R. J. Needs, and G. Ra- 

jagopal, Rev. Mod. Phys. 73, 33 (2001). 
[2] C. J. Umrigar and C. Filippi, Phys. Rev. Lett. 94, 150201 

(2005). 

[3] S. Sorella, Phys. Rev. B 71, 241103 (2005). 

[4] M. P. Nightingale and V. Melik-Alaverdian, Phys. Rev. 

Lett. 87, 043401 (2001). 
[5] J. Toulouse and C. J. Umrigar, J. Chem. Phys. 126, 

084102 (2007). 

[6] C. J. Umrigar, J. Toulouse, C. Filippi, S. Sorella, and 
R. G. Hennig, Phys. Rev. Lett. 98, 110201 (2007). 

[7] J. Toulouse and C. J. Umrigar, J. Chem. Phys. 128, 
174101 (2008). 

[8] S. Sorella, Phys. Rev. B 64, 024512 (2001). 

[9] A. W. Sandvik and G. Vidal, Phys. Rev. Lett. 99, 220602 
(2007). 

[10] R. B. Morgan, J. Comput. Phys. 89, 241 (1990). 

[11] N. H. F. Beebe and J. Linderberg, Int. J. Quantum 

Chem. 12, 683 (1977). 
[12] H. J. Changlani, J. M. Kinder, C. J. Umrigar, and G. K.- 

L. Chan, Phys. Rev. B. 80, 245116 (2009). 
[13] F. Mezzacapo, N. Schuch, M. Boninsegni, and J. I. Cirac, 

New J. Phys. 11, 083026 (2009). 



eralized Davidson solver can be used to provide a similar 
improvement for the linear method. Using our acceler- 
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advances provide a powerful new method for modeling 
both quantum chemical and solid state systems. In the 
future, we expect optimizations of millions of parameters 
to be possible, which will allow even more sophisticated 
trial wavefunctions to be used in variational and diffusion 
Monte Carlo. 
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